-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework client, connect, to simplify and prepare for simpler healing. #1083
Rework client, connect, to simplify and prepare for simpler healing. #1083
Conversation
Signed-off-by: Ed Warnicke <hagbard@gmail.com>
e6faea9
to
c7c2da4
Compare
Signed-off-by: Ed Warnicke <hagbard@gmail.com>
pkg/tools/sandbox/dial_options.go
Outdated
@@ -57,7 +57,6 @@ func DialOptions(options ...DialOption) []grpc.DialOption { | |||
), | |||
grpc.WithBlock(), | |||
grpc.WithDefaultCallOptions( | |||
grpc.WaitForReady(true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Bolodya1997 I removed this line because without it
sdk/pkg/networkservice/chains/nsmgr/suite_test.go
Lines 310 to 356 in 07ced33
func (s *nsmgrSuite) Test_ConnectToDeadNSEUsecase() { | |
t := s.T() | |
ctx, cancel := context.WithTimeout(context.Background(), time.Second*5) | |
defer cancel() | |
nsReg, err := s.nsRegistryClient.Register(ctx, defaultRegistryService()) | |
require.NoError(t, err) | |
nseReg := defaultRegistryEndpoint(nsReg.Name) | |
counter := new(count.Server) | |
nse := s.domain.Nodes[0].NewEndpoint(ctx, nseReg, sandbox.GenerateTestToken, counter) | |
nsc := s.domain.Nodes[0].NewClient(ctx, sandbox.GenerateTestToken) | |
request := defaultRequest(nsReg.Name) | |
reqCtx, reqCancel := context.WithTimeout(ctx, time.Second) | |
defer reqCancel() | |
conn, err := nsc.Request(reqCtx, request.Clone()) | |
require.NoError(t, err) | |
require.NotNil(t, conn) | |
require.Equal(t, 1, counter.Requests()) | |
require.Equal(t, 5, len(conn.Path.PathSegments)) | |
nse.Cancel() | |
// Simulate refresh from client | |
refreshRequest := request.Clone() | |
refreshRequest.Connection = conn.Clone() | |
refreshReqCtx, refreshReqCancel := context.WithTimeout(ctx, time.Second) | |
defer refreshReqCancel() | |
_, err = nsc.Request(refreshReqCtx, refreshRequest) | |
require.Error(t, err) | |
require.NoError(t, reqCtx.Err()) | |
// Close | |
_, _ = nsc.Close(ctx, conn) | |
// Endpoint unregister | |
_, err = s.domain.Nodes[0].NSMgr.NetworkServiceEndpointRegistryServer().Unregister(ctx, nseReg) | |
require.NoError(t, err) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What the error do you see in Test_ConnectToDeadNSEUsecase with grpc.WaitForReady(true)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose that there are some reasons why we may want to use WaitForReady
flag, but I am not actually sure because we most probably had started using it before I have started working on NSM :)
There was an issue with Request hanging in such case and so it is the reason why do we have such test.
We solved this issue in connect
several times with several approaches during the heal
development. Currently connect
uses heal
client (or some other monitoring client) to understand that remote side is dead and so close the gRPC connection to prevent Request hanging with WaitForReady
flag set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edwarnicke
Any update for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've restored the grpc.WaitForReady(true)
98f7de4
to
62bf2d6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like parts with simplifications related to dial/clientconn, but it this PR also removes heal. So it seems to me to merge this PR we also need to do one of these options:
- Remove/Disable all heal tests.
- Wait for heal rework.
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
package nsmgr_test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to remove heal tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we don't have the heal replacement yet :) Will revive them when that comes in.
) | ||
|
||
// ExampleForwarder - example of how to use the connect chain element in a forwarder | ||
func ExampleNewServer() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like idea on adding golang examples,
Do we need to add some asserts and outputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would make it more of a test wouldn't it?
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
package connect_test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests cover many corner cases. Can we keep them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally agree with Denis, most of our connect
tests have been motivated by some painful issues. Removing them will most probably lead us back to the old problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Bolodya1997 Added the tests back... and yes, they caught some issues :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added few missed goleak
checks to connect
tests (#1087), please can you rebase your tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/tools/sandbox/dial_options.go
Outdated
@@ -57,7 +57,6 @@ func DialOptions(options ...DialOption) []grpc.DialOption { | |||
), | |||
grpc.WithBlock(), | |||
grpc.WithDefaultCallOptions( | |||
grpc.WaitForReady(true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What the error do you see in Test_ConnectToDeadNSEUsecase with grpc.WaitForReady(true)
?
} | ||
|
||
// WithoutRefresh disables refresh | ||
func WithoutRefresh() Option { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we have different token timeout for the different applications?
NSC(1h) -> NSMgr(30m) -> Forwarder(15m) -> NSMgr(30m) -> NSE
If we use such option on NSMgr, Forwarder
we should probably compute the least token timeout in refresh
on NSC
to become sure that there will be no timeouts in the chain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably it can be some rare case for the local scenario, because most probably all of the applications share the same spire server with same SVID TTL for all of them, but it also can be an interdomain scenario with different spire federations and so different SVID TTLs for local and remote applications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Bolodya1997 You are completely correct we need to watch out for this. We could do as you suggested and compute that in refresh. I'm tempted though to do it in updatetoken by clamping down expiretime to the smallest in the path on the return direction of the path.
Signed-off-by: Ed Warnicke <hagbard@gmail.com>
Signed-off-by: Ed Warnicke <hagbard@gmail.com>
and also networkservicemesh#1083 (comment) Signed-off-by: Ed Warnicke <hagbard@gmail.com>
Signed-off-by: Ed Warnicke <hagbard@gmail.com>
Signed-off-by: Ed Warnicke <hagbard@gmail.com>
Signed-off-by: Ed Warnicke <hagbard@gmail.com>
Signed-off-by: Ed Warnicke <hagbard@gmail.com>
Signed-off-by: Ed Warnicke <hagbard@gmail.com>
Signed-off-by: Ed Warnicke <hagbard@gmail.com>
Signed-off-by: Vladimir Popov <vladimir.popov@xored.com>
Signed-off-by: Vladimir Popov <vladimir.popov@xored.com>
Signed-off-by: Vladimir Popov <vladimir.popov@xored.com>
Signed-off-by: Vladimir Popov <vladimir.popov@xored.com>
Signed-off-by: Vladimir Popov <vladimir.popov@xored.com>
…k-vpp@main PR link: networkservicemesh/sdk-vpp#386 Commit: d52a3eb Author: Ed Warnicke Date: 2021-10-14 01:04:19 -0500 Message: - Adapt to networkservicemesh/sdk#1083 (#386) Signed-off-by: NSMBot <nsmbot@networkservicmesh.io>
…k-vpp@main PR link: networkservicemesh/sdk-vpp#386 Commit: d52a3eb Author: Ed Warnicke Date: 2021-10-14 01:04:19 -0500 Message: - Adapt to networkservicemesh/sdk#1083 (#386) Signed-off-by: NSMBot <nsmbot@networkservicmesh.io>
Manually update to networkservicemesh/sdk#1083
Signed-off-by: Vladimir Popov <vladimir.popov@xored.com>
…k-ovs@main PR link: networkservicemesh/sdk-ovs#43 Commit: a199406 Author: Vladimir Popov Date: 2021-10-14 13:59:29 +0700 Message: - Manually update to networkservicemesh/sdk#1083 (#43) Signed-off-by: NSMBot <nsmbot@networkservicmesh.io>
Signed-off-by: Vladimir Popov <vladimir.popov@xored.com>
Signed-off-by: Vladimir Popov <vladimir.popov@xored.com>
…d-nsc@main PR link: https://github.com/networkservicemesh/cmd-nsc/pull/ Commit: adc8748 Author: Vladimir Popov Date: 2021-10-14 16:23:34 +0700 Message: - Manually update to networkservicemesh/sdk#1083 (#290) Signed-off-by: NSMBot <nsmbot@networkservicmesh.io>
…d-nsc-vpp@main PR link: https://github.com/networkservicemesh/cmd-nsc-vpp/pull/ Commit: 4371e4d Author: Vladimir Popov Date: 2021-10-14 16:23:46 +0700 Message: - Manually update to networkservicemesh/sdk#1083 (#267) Signed-off-by: NSMBot <nsmbot@networkservicmesh.io>
…d-nsc-vpp@main PR link: https://github.com/networkservicemesh/cmd-nsc-vpp/pull/ Commit: 4371e4d Author: Vladimir Popov Date: 2021-10-14 16:23:46 +0700 Message: - Manually update to networkservicemesh/sdk#1083 (#267) Signed-off-by: NSMBot <nsmbot@networkservicmesh.io>
Signed-off-by: Ed Warnicke <hagbard@gmail.com>
Signed-off-by: Ed Warnicke <hagbard@gmail.com>
…d-forwarder-vpp@main PR link: https://github.com/networkservicemesh/cmd-forwarder-vpp/pull/ Commit: 5494d34 Author: Ed Warnicke Date: 2021-10-14 10:05:42 -0500 Message: - Adapt to Adapt to networkservicemesh/sdk#1083 (#342) Signed-off-by: NSMBot <nsmbot@networkservicmesh.io>
Previously we had a single 'client' per grpc.ClientConnInterface.
This created a lot of complexity, particularly in connect.
This rework of client moves to a single 'client' that can handle
multiple grpc.ClientConnInterfaces.
This vastly simplified both the client chain and connect chain.
Finally, heal has been temporarily removed in preparation for a rework
of heal focused on back propogating Connection 'DOWN' events via
monitor. The net net at the end of the process should be:
of passthrough NSEs
pf passthrough NSEs.
Signed-off-by: Ed Warnicke hagbard@gmail.com
Description
Issue link
How Has This Been Tested?
Types of changes